suppose assumption 1
SURF: Steering the Scalarization Weight to Uniformly Traverse the Pareto Front
Jiang, Liuyuan, Huang, Chentong, Chen, Lisha
Scalarization is widely used in multi-objective optimization owing to its simplicity and scalability. In many applications, the goal is to generate solutions that represent diverse user preferences, ideally with uniform coverage of the Pareto front (PF). However, uniformly sampling scalarization weights usually induces non-uniform coverage of the PF. We explain this mismatch through a geometric analysis of the scalarization path. As the scalarization weight varies, the corresponding solutions trace the PF with a generally non-uniform traversal speed. This speed induces an arc-length cumulative distribution function (CDF); inverting this CDF map yields a principled rule for selecting weights that produce uniform PF coverage. Building on this insight, we propose SURF (Sampling Uniformly along the PaReto Front). For structured problems, including bi-objective bandits, we derive closed-form expressions for this CDF map and the resulting PF-aware weight sampling rule. For general problems, SURF alternates between CDF reconstruction and weight sampling. Theoretically, we show that under provable conditions, SURF converges linearly to an unavoidable finite-sampling floor. Empirically, experiments on bandits, multi-objective-gymnasium, and multi-objective LLM alignment demonstrate that SURF efficiently achieves more uniform PF coverage than baselines.
Supplementary Materials AExpanded Related Work
A number of gradient-based bilevel algorithms have been proposed via AIDand ITD-based hypergradient approximations. For example, AID-based hypergradient computation [4, 33, 10, 11, 19] estimates the Hessian-inverse-vector product by solving a linear system with an efficient iterative algorithm. ITD-based hypergradient computation [31, 8, 9, 6, 35, 17] involves a backpropagation over the inner-loop gradient-based optimization path. Convergence rate of AIDand ITD-based bilevel methods has been studied recently. For example, [10, 19] and [19, 17] analyzed the convergence rate and complexity of AIDand ITD-based bilevel algorithms, respectively.
Heavy-Tailed and Long-Range Dependent Noise in Stochastic Approximation: A Finite-Time Analysis
Chandak, Siddharth, Yadav, Anuj, Ozgur, Ayfer, Bambos, Nicholas
Stochastic approximation (SA) is a fundamental iterative framework with broad applications in reinforcement learning and optimization. Classical analyses typically rely on martingale difference or Markov noise with bounded second moments, but many practical settings, including finance and communications, frequently encounter heavy-tailed and long-range dependent (LRD) noise. In this work, we study SA for finding the root of a strongly monotone operator under these non-classical noise models. We establish the first finite-time moment bounds in both settings, providing explicit convergence rates that quantify the impact of heavy tails and temporal dependence. Our analysis employs a noise-averaging argument that regularizes the impact of noise without modifying the iteration. Finally, we apply our general framework to stochastic gradient descent (SGD) and gradient play, and corroborate our finite-time analysis through numerical experiments.
High-probabilitycomplexityguaranteesfornonconvex minimaxproblems
To this end, high-probability guarantees have been considered in the literature [35, 64, 20, 32, 22]. These results allow to control the risk associated with the worst-case tail events as theyspecify howmanyiterations would be sufficient toensureG(xk,yk) issufficiently small foranygivenfailure probability q (0,1).